Towards a fast parallel sparse matrix-vector multiplication
نویسندگان
چکیده
The sparse matrix-vector product is an important computational kernel that runs ineffectively on many computers with super-scalar RISC processors. In this paper we analyse the performance of the sparse matrix-vector product with symmetric matrices originating from the FEM and describe techniques that lead to a fast implementation. It is shown how these optimisations can be incorporated into an efficient parallel implementation using messagepassing. We conduct numerical experiments on many different machines and show that our optimisations speed up the sparse matrix-vector multiplication substantially.
منابع مشابه
Two-dimensional cache-oblivious sparse matrix-vector multiplication
In earlier work, we presented a one-dimensional cache-oblivious sparse matrix–vector (SpMV) multiplication scheme which has its roots in one-dimensional sparse matrix partitioning. Partitioning is often used in distributed-memory parallel computing for the SpMV multiplication, an important kernel in many applications. A logical extension is to move towards using a two-dimensional partitioning. ...
متن کاملOptimizing Parallel Sparse Matrix-Vector Multiplication by Partitioning
Sparse matrix times vector multiplication is an important kernel in scientific computing. We study how to optimize the performance of this operation in parallel by reducing communication. We review existing approaches and present a new partitioning method for symmetric matrices. Our method is simple and can be implemented using existing software for hypergraph partitioning. Experimental results...
متن کاملA General Graph Model for Representing Exact Communication Volume in Parallel Sparse Matrix-Vector Multiplication
In this paper, we present a new graph model of sparse matrix decomposition for parallel sparse matrix–vector multiplication. Our model differs from previous graph-based approaches in two main respects. Firstly, our model is based on edge colouring rather than vertex partitioning. Secondly, our model is able to correctly quantify and minimise the total communication volume of the parallel sparse...
متن کاملEfficient Multicore Sparse Matrix-Vector Multiplication for Finite Element Electromagnetics on the Cell-BE processor
Multicore systems are rapidly becoming a dominant industry trend for accelerating electromagnetics computations, driving researchers to address parallel programming paradigms early in application development. We present a new sparse representation and a two level partitioning scheme for efficient sparse matrix-vector multiplication on multicore systems, and show results for a set of finite elem...
متن کاملData-parallel programming with Intel Array Building Blocks (ArBB)
Intel Array Building Blocks is a high-level data-parallel programming environment designed to produce scalable and portable results on existing and upcoming multiand many-core platforms. We have chosen several mathematical kernels a dense matrix-matrix multiplication, a sparse matrix-vector multiplication, a 1-D complex FFT and a conjugate gradients solver as synthetic benchmarks and representa...
متن کامل